REFRACTIVE: An Open Source Tool to Extract Knowledge from Syntactic and Semantic Relations
نویسندگان
چکیده
The extraction of semantic propositions has proven instrumental in applications like IBM Watson (Ferrucci, 2012) and in Google’s knowledge graph (Singhal, 2012). One of the core components of IBM Watson is the PRISMATIC knowledge base consisting of one billion propositions extracted from the English version of Wikipedia and the New York Times (Fan et al., 2010). However, extracting the propositions from the English version of Wikipedia is a time-consuming process. In practice, this task requires multiple machines and a computation distribution involving a good deal of system technicalities. In this paper, we describe REFRACTIVE, an open-source tool to extract propositions from a parsed corpus based on the Hadoop variant of MapReduce. While the complete process consists of a parsing part and an extraction part, we focus here on the extraction from the parsed corpus and we hope this tool will help computational linguists speed up the development of applications.
منابع مشابه
برچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کاملA text categorisation tool for open source communities based on semantic analysis
Open source software (OSS) projects are supported by communities interacting through software repositories and mailing lists. Thousands of contributors participate in the development of the projects although they rarely meet each other. The result is a huge archived repository with thousands of questions, answers and contributions usually difficult to explore. We propose a tool based on semanti...
متن کاملA Language Model for Extracting Implicit Relations
Open Information Extraction has shown promise of overcoming a knowledge engineering bottleneck, but has a fundamental limitation. It is unable to extract implicit relations, where the sentence lacks an explicit relation phrase. We present IMPLIE (Implicit relation Information Extraction) that uses an open-domain syntactic language model and user-supplied semantic taggers to overcome this limita...
متن کاملبرچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014